Multicode: A Truly Multilingual Approach to text Encoding

نویسنده

Muhammad F. Mudawwar

چکیده

priate for use in different countries have increased demand for a standard character set for use with many different languages. Currently, the ASCII character set1 is the world’s most widely accepted and used standard character set for computers, operating systems, compilers, and e-mail systems. However, while ASCII encoding adequately represents English text, it does not address the problem of handling text in other languages. ASCII is a 7-bit code and defines only 128 characters. When used with an 8-bit character format, the 128 characters that ASCII would not use could be used as extensions to ASCII. These extensions would be used to define characters of different languages. For example, the ISO 8859 standard2 defines a Latin extension (that supports many European languages), as well as Cyrillic, Arabic, Greek, Hebrew, and other language extensions. To handle documents that mix English with a second language, you must use the ASCII extension for the second language. However, this approach presents two key problems:

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Interactive multilingual text generation for a monolingual user

In this paper we describe an approach to machine translation which involves multilingual text generation via interaction with the user, who is monolingual: the system will work in a specific and fairly restricted domain. Key techniques used include the use of examples rather than linguistic rules to give the equivalents between the languages, and the encoding of contextual knowledge in the form...

متن کامل

Chapter 4 Character encoding in corpus construction

Corpus linguistics has developed, over the past three decades, into a rich paradigm that addresses a great variety of linguistic issues ranging from monolingual research of one language to contrastive and translation studies involving many different languages. Today, while the construction and exploitation of English language corpora still dominate the field of corpus linguistics, corpora of ot...

متن کامل

Against multilinguality

1. Introduction An obvious assumption of the present workshop is that multilingual corpora are useful, and should be built and investigated. In the present paper, I would like to point out that this is far from straightforward and actually remains to be proved. In addition, and in a more constructive vein, I want to present some examples that show that the right encoding depends crucially on wh...

متن کامل

Towards a Language Independent Encoding of Documents: A Novel Approach to Multilingual Question Answering

Given source text in several languages, can one answer queries in some other language, without translating any of the sources into the language of the questioner? In this paper we try to address this question as we report our work on a restricted domain, multilingual Question – Answering system, with current implementations for source text in English and questions posed in English and Hindi. Th...

متن کامل

A truly multilingual, high coverage, accurate, yet simple, subsentential alignment method

This paper describes a new alignment method that extracts high quality multi-word alignments from sentence-aligned multilingual parallel corpora. The method can handle several languages at once. The phrase tables obtained by the method have a comparable accuracy and a higher coverage than those obtained by current methods. They are also obtained much faster.

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

IEEE Computer

دوره 30 شماره

صفحات -

تاریخ انتشار 1997

Multicode: A Truly Multilingual Approach to text Encoding

نویسنده

چکیده

منابع مشابه

Interactive multilingual text generation for a monolingual user

Chapter 4 Character encoding in corpus construction

Against multilinguality

Towards a Language Independent Encoding of Documents: A Novel Approach to Multilingual Question Answering

A truly multilingual, high coverage, accurate, yet simple, subsentential alignment method

عنوان ژورنال:

اشتراک گذاری